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AMENPMENTS TO THE CLAIMS: 

This listing of claims will replace all prior versions, and listings, of claibs in the 
application: 
Listing of Claims: 

1 . (currently amended) A method for predicting a polyadenylation site comprising: 
inputting a plurality of RNA transcript sequences or sequenc e s dervi e d fo r m fllNA 



troncoript sequ e nc e s , wherein at least one sequence has its poly A or poly T trz ct 
sequence; 

searching for a polyadenylation site, wherein the polyadenylation is an adenine rich 
region at the 31 end of the sequence or a thymine rich region at the beginning 5' end of 
the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation site by 
scanning analyzing [[the]] EST or RNA sequences or their corresponding genomic DNA 
sequences. 



2- (currently amended) The method of Claim 1 wherein the step of 
polyadenylation site comprising comprises scanning analyzing the sequences 
rich region at the 31end of the sequence or a thymine rich region at the 
of the sequence. 
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3. (original) The method of Claim 2 wherein the adenine rich region com >rises 
adenine in at least 50% of the region and the thymine region comprises thyminp in at 
least 50% of the region. 

4. (original) The method of Claim 2 wherein the adenine rich region com prises 
adenine in at least 60% of the region and the thymine rich region comprises thymine in at 
least 60% of the region. 

5. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 70% of the region and the thymine rich region comprises th /mine in at 
least 70% of the region, 

6. (original) The method of Claim 2 wherein the adenine rich region con: prises 
adenine in at least 80% of the region and the thymine rich region comprises thymine in at 
least 80% of the region. 

7. (currently amended) The method of Claim 1 wherein a heuristic score n A / 0u ■ 
0.5*(max(nR-20,0))) is used for detecting adenine jjch or thymine rich region; (wherein n A 
is the number of adenines or thymines in the block, and n R is the number of bises 
downstream of the block of adenines or thymines thymin e to the end of the se juence, 
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8. (currently amended) A method for detecting polyadenylation signal in 
with a polyadenylation site comprising searching for a polyadenylation signal 
the sequence feeler 5 ? of the polyadenylation site. 



a sequence 
Hexamer in 



9. (currently amended) The method of Claim 8 wherein the searching coir prises 
evaluating the probability that there is a polyadenylation [[site]] signal: Pr(h=k|x) for 
k=6,7,...,N, wherein the sequence beft*a 51of the polyadenylation site is x=(xl ? x 2 ,. . 
x N ) and where x N is the most 3'-most base befor e upstream of the polyadenylation site, 

10. (original) The method according to Claim 9 wherein Pr(h=k|x) = Pr(x|l|>=k) 
Pr(h=k)/Pr(x). 

11. (original) The method of Claim 10 wherein Pr(h=k|x) = Pr (x k . 5t . . . pcj i=k) 
Pr(h=*)/Pr(x k .5, . . .,xO and wherein Pr(h-k) is the probability that the polyaden|ylation 
hexamer is located at position k in the sequence, at a distance (N-k) from the 
polyadenylation site, Pr (x^ f ...,x k |h=k) is the probability of observing the 
-.,Xk) given that it is a polyadenylation signal and Pr (x^s, .-.^k|h^k) is the 
observing the hexamer given that it is not from the polyadenylation signal. 



12. (cuuently amended) The method of Claim 1 1 wherein the step of deteptin 
comprises using a gamma function to produce a density distribution which places the 
majority of its weight on the positions located 5 to 25 bases distant form the 
polyadenylation site. 
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probability of 
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13. (original) The method of Claim 12 wherein Pr(x k _s, - - - ,XkMk), the probability of 
observing the hexamer given that it is not from a polyadenylation signal, is modeled 
using a second-order Markov model trained on data collected from human 3 7 U Us. 



Pr(x k j5)Pr(x k . 
first 



the: 



14. (original) The method of Claim 13 wherein Pr (x k _s, - - .^k|h^k) = 
4|Xk_ 5 ) Pr(x k - 3 |x k -5,Xk-4) Pr(x fc .2|xic^x.3) Pr(x k _i|xk.3,x k -2) Pr(x k |x k .2,x k .i) wherein 
term is a 2ero-order Markovian probability, the second is a first-order Markovi m 
probability and the remaining four terms are second-order Markovian probabili ties 
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15. (original) The method of Claim 14 wherein, for a k* -order Markov m 
probability of base b following a word w of length k is estimated by the frequency 
concatenated word (wb) divided by the frequency of the word w, where freq 
computed from the training dataset of 3'UTR sequences, 

16. (original) The method of Claim 15 wherein, for the case k = 0 (a zero- >rder 
Markovian model), the probability of base b is estimated by its frequency in th|e dataset 
divided by the size of the dataset. 

17. (currently amended) A computer readable medium comprising computer 
executable instructions for performing the method comprising: 
inputting a plurality of RNA transcript sequences or s e qu e nces derived form I^M A 



transcript sequenc e s , wherein at least one sequence has its poly A or poly T tifcct 
sequence; 
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searching for a polyadenylation site, wherein the polyadenylation is an adenine 
region at the 31 end of the sequence or a thymine rich region at the beginning J 
the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenyl; 
scanning analyzing [[the]] EST or RNA sequences or their corresponding 
sequences. 



rich 
end of 



alien site by 
DNA 



genome 



1 8. (currently amended) The computer readable medium of Claim 17 
of searching for a polyadenylation site comprising comprises scanning analyzing 



rici 



sequences for adenine rich region at the 3jend of the sequence or a thymine 
the beginning 5' end of the sequence. 



19. (original) The computer readable medium of Claim 18 wherein the adenine rich 
region comprises adenine in at least 50% of die region and the thymine rich region 
comprises thymine in at least 50% of the region. 

20. (original) The computer readable medium of Claim 19 wherein the ad mine rich 
region comprises adenine in at least 60% of the region and the thymine rich re&on 
comprises thymine in at least 60% of the region. 

21. (original) The computer readable medium of Claim 20 wherein the adenine rich 
region comprises adenine in at least 70% of the region and the thymine rich rdgion 
comprises thymine in at least 70% of the region. 
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22. (original) The computer readable medium of Claim 21 wherein the ader 
region comprises adenine in at least 80% of the region and the thymine rich region 
comprises thymine in at least 80% of the region. 



ine rich 



wheieina 



or 

block, and 



23. (currently amended) The computer readable medium of Claim 17 
heuristic score n A / (n A + CL5*(max(nir20,0))) is used for detecting adenine rich 
thymine rich region; wherein n A is the number of adenines or thymines in the 
n R is the number of bases [[after]] downstream of the block of adenines or thymin es 
thymines to the end of the sequence. 

24. (currently amended) A computer readable medium comprising compui er- 
executable instructions for performing the method comprising: searching for a 
polyadenylation signal hexamer in the sequence boforo 5* of the polyadenylati >n site. 

25. (currently amended) The computer readable medium of Claim 24 wherein the 
searching comprises evaluating the probability that there is a polyadenylation Isite]] 
signal : Pr(h=k|x) for k=£/7,. . .,N, wherein the sequence boforo 5* of the polyac fcnylation 
site is x~(x a , x 2 ,..., x N ) and where xn is the 3'-most base b efore upstream of tlje 
polyadenylation site. 

26. (original) The computer readable medium of Claim 25 wherein: Pr(h=|c|x) •> 
Pr(x|h-k) Pr<h=k)/Pr(x). 
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27. (original) The computer readable medium of Claim 26 wherein: Pr(h=k|x) = Pr 
(x w , ....XkMO Pr(h=k)/Pr(xx^ ...As) and wherein Pr(h=*) is the probability t lat the 
polyadenylation hexamer is located at position k in the sequence, at a distance TST-k) from 
the polyadenylation site, Pr (x k _ 5 , . . ,,x x |h=k) is the probability of observing the hexamer 
(xic-5, ...,Xk) given that it is a polyadenylation signal and Pr (Xk-s, ...,x k |h£k) is t le 
probability of observing the hexamer that it is not from the polyadenylation signal. 

28, (currently amended) The computer readable medium of Claim 27 whejein the 
step of detecting comprises using a gamma function to produce a density distr ibution 



which places the majority of its weight on the positions located 5 to 25 bases distant form 
the polyadenylation site. 

29. (original) The computer readable medium of Claim 28 wherein Pr (x k . i, 
...,x k |h#k), the probability of observing the hexamer given that it is not from a 
polyadenylation signal, is modeled using a second-order Markov model train© i on data 
collected from human 3'UTRs. 

30. (original) The computer readable medium of Claim 29 wherein Pr(x k ^ 
...,x k |h#0 = Pr(Xk.s) Pr<x M |x k -5) Pr(Xk.3|* k -5,x k _4) Prfrk^xw**,^) ^^i|*m.x|2) Pr(x k |xk- 
2»Xk-0 wherein the first term is a zero-order Markovian probability, the second is a first- 
order Markovian probability and the remaining four terms are second-order W arkovian 
probabilities. 
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31 . (original) The computer readable medium of Claim 30 wherein, for a 
Markov model, the probability of base b following a word w of length k is 
the frequency of the concatenated word (wb) divided by the frequency of the 
where frequencies are computed from the training dataset of 3'UTR sequences. 



kT -order 
by 



estimated! 



uord' 
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32. (original) The computer readable medium of Claim 3 1 wherein, for 
0 (a zero-order Markovian model), the probability of base b is estimated by its 
in the dataset divided by the size of the dataset. 



33, (currently amended) A system comprising: a processor; and a memory coupled 
with the processor, the memory storing a plurality of machine instructions thai[ cause the 
processor to perform logical steps of the method comprising: 

inputting a plurality of RNA transcript sequences or Expressed Sequence Tag; (EST) 



GoquencoQ derived form UNA transcript sequences , wherein at least one sequence has its 
poly A or poly T tract sequence; 

searching for a polyadenylation site, wherein the polyadenylation is an adenin s rich 
region at the 31 end of the sequence or a thymine rich region at the boginning b T end of 
the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation 



scanning analyzing [[the]] EST or RNA sequences or their corresponding genomic 
sequences. 
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34. (currently amended) The system of Claim 33 wherein the step of 
polyadenylation site comprising comprises scanning analyzing the sequences 
rich region at the 3lend of the sequence or a thymine rich region at the boginnij, 
of the sequence. 



35. (original) The system of Claim 34 wherein the adenine rich region con prises 
adenine in at least 50% of the region and the thymine rich region comprises thyronine in at 
least 50% of the region. 

36. (original) The system of Claim 35 wherein the adenine rich region comprises 
adenine in at least 60% of the region and the thymine rich region comprises thymine in at 
least 60% of the region. 

37. (original) The system of Claim 36 wherein the adenine rich region cor lprises 
adenine in at least 70% of the region and the thymine rich region comprises thymine in at 
least 70% of the region, 

38. (original) The system of Claim 37 wherein the adenine rich region comprises 
adenine in at least 80% of the region and the thymine rich region comprises tqymine in at 
least 80% of the region. 



39. (currently amended) The system of Claim 33 wherein a heuristic scon 5 
0.5*(max(nR-20,0))) is used for detecting adenine rich or thymine rich region 



n A /(n A + 
wherein: 



10 



PAGE 1 1/14 1 RCVD AT 9/2112005 5:56:08 PM [Eastern Daylight Time] ' SVR:USPTO«EFXRF-6/27 ' DNIS:2738300 ' CSID:4087315392 * DURATION (mnws):02-56 



Sap-21-05 01:59pm From-Affymetrlx, Inc. 
Application No.: 10/028,416 



408 731 5392 



T-245 P. 01 2/014 F- 



n A is the number of adenines or thymines in the block, and nR is the number of bases 
[[after]] downstream of the block of adenines or thymines to the end of the sequence. 

40. (currently amended) A system comprising a processor, and a memory coupled 
with the processor, the memory storing a plurality of machine instructions that cause the 
processor to perform logical steps of the method for detecting polyadenylation signal in a 
sequence with a polyadenylation site comprising: searching for a polyadenylati on signal 
hexamer in the sequence b e for e 5' of the polyadenylation site. 



4.| 



41. (currently amended) The system of Claim 40 wherein the searching comprises 
evaluating the probability that there is a polyadenylation [[site]] signal : Pr(h=] 
k=*>,7,. . . ,N, wherein the sequence befor e 5' of the polyadenylation site is x=(a 
x N ) and where x N is the most 3*-most base before upstream of the polyadenyla A 



42. (original) The system of Claim 41 wherein: Pr(h=k|x) - Pr(x|h=k) Pr(h=k)/Pr(x) 

43. (original) The system of Claim 42 wherein Pr(h=k|x) = Pr (x k _ 5 , ...,Xkjh=l;) 
Pr(h=k)/Pr(x k _5, - . - ,Xk) and wherein Pr(h=k) is the probability that the polyadenylation 
hexamer is located at position k in the sequence, at a distance (N-k) from the 
polyadenylation site, Pr(Xk-5 7 . . >,x k ]h=k) is the probability of observing the hexamer (x M , 
~-,xO given that it is a polyadenylation signal and Pr(x k . 5 , ...,x k |h£k) is the probability of 
observing the hexamer given that it is not from the polyadenylation signal. 
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44. (currently amended) The system of Claim 43 wherein the step of 
comprises using a gamma function to produce a density distribution w hich pi 
majority of its weight on the positions located 5 to 25 bases distant form the 
polyadenylation site. 



45. (original) The system of Claim 44 wherein Pj^x^, ...^t k |h#k), the probability of 
observing the hexamer given that it is not from a polyadenylation signal, is mddeled 
using a second-order Markov model trained on data collected from human 3'UTRs. 



46. (original) The system of Claim 45 wherein Pr (x k . 5 » . . .,x k |h#c) = Pr(x k | 5 ) 
4 |x k _ 5 ) Pr(x k .3|x k . 3 ^M) Prtx M |x Mf x t .3) Pr(x k .i|x k .3 ? x w ) Pr(Xj4x k _ 2 ,x k .i), wherein 
term is a zero-order Markovian probability, the second is a first-order Markov 
probability and the remaining four terms are second-order Markovian probability 



Pr(x k . 
the first 



47. (original) The system of Claim 46 wherein, for a k* -order Markov Hodel, the 
probability of base b following a word w of length k is estimated by the frequency of the 
concatenated word (wb) divided by the frequency of the word w, where frequencies are 
computed from the training dataset of 3'UTR sequences. 

48. (original) The system of Claim 47 wherein, for the case k ~ 0 (a zero^rder 
Markovian model), the probability of base b is estimated by its frequency in the dataset 
divided by the size of the dataset. 
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